22 research outputs found
Programming with a Differentiable Forth Interpreter
Given that in practice training data is scarce for all but a small set of
problems, a core question is how to incorporate prior knowledge into a model.
In this paper, we consider the case of prior procedural knowledge for neural
networks, such as knowing how a program should traverse a sequence, but not
what local actions should be performed at each step. To this end, we present an
end-to-end differentiable interpreter for the programming language Forth which
enables programmers to write program sketches with slots that can be filled
with behaviour trained from program input-output data. We can optimise this
behaviour directly through gradient descent techniques on user-specified
objectives, and also integrate the program into any larger neural computation
graph. We show empirically that our interpreter is able to effectively leverage
different levels of prior program structure and learn complex behaviours such
as sequence sorting and addition. When connected to outputs of an LSTM and
trained jointly, our interpreter achieves state-of-the-art accuracy for
end-to-end reasoning about quantities expressed in natural language stories.Comment: 34th International Conference on Machine Learning (ICML 2017
Break it Down for Me: A Study in Automated Lyric Annotation
Comprehending lyrics, as found in songs and poems, can pose a challenge to
human and machine readers alike. This motivates the need for systems that can
understand the ambiguity and jargon found in such creative texts, and provide
commentary to aid readers in reaching the correct interpretation. We introduce
the task of automated lyric annotation (ALA). Like text simplification, a goal
of ALA is to rephrase the original text in a more easily understandable manner.
However, in ALA the system must often include additional information to clarify
niche terminology and abstract concepts. To stimulate research on this task, we
release a large collection of crowdsourced annotations for song lyrics. We
analyze the performance of translation and retrieval models on this task,
measuring performance with both automated and human evaluation. We find that
each model captures a unique type of information important to the task.Comment: To appear in Proceedings of EMNLP 201
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Existing dialogue models may encounter scenarios which are not
well-represented in the training data, and as a result generate responses that
are unnatural, inappropriate, or unhelpful. We propose the "Ask an Expert"
framework in which the model is trained with access to an "expert" which it can
consult at each turn. Advice is solicited via a structured dialogue with the
expert, and the model is optimized to selectively utilize (or ignore) it given
the context and dialogue history. In this work the expert takes the form of an
LLM. We evaluate this framework in a mental health support domain, where the
structure of the expert conversation is outlined by pre-specified prompts which
reflect a reasoning strategy taught to practitioners in the field. Blenderbot
models utilizing "Ask an Expert" show quality improvements across all expert
sizes, including those with fewer parameters than the dialogue model itself.
Our best model provides a improvement over baselines, approaching
human-level scores on "engingingness" and "helpfulness" metrics.Comment: Accepted in Findings of the Association for Computational
Linguistics: ACL 202
Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation
Knowing how to end and resume conversations over time is a natural part of
communication, allowing for discussions to span weeks, months, or years. The
duration of gaps between conversations dictates which topics are relevant and
which questions to ask, and dialogue systems which do not explicitly model time
may generate responses that are unnatural. In this work we explore the idea of
making dialogue models aware of time, and present GapChat, a multi-session
dialogue dataset in which the time between each session varies. While the
dataset is constructed in real-time, progress on events in speakers' lives is
simulated in order to create realistic dialogues occurring across a long
timespan. We expose time information to the model and compare different
representations of time and event progress. In human evaluation we show that
time-aware models perform better in metrics that judge the relevance of the
chosen topics and the information gained from the conversation.Comment: Accepted in the Findings of EMNLP 202
Hypothesis Only Baselines in Natural Language Inference
We propose a hypothesis only baseline for diagnosing Natural Language
Inference (NLI). Especially when an NLI dataset assumes inference is occurring
based purely on the relationship between a context and a hypothesis, it follows
that assessing entailment relations while ignoring the provided context is a
degenerate solution. Yet, through experiments on ten distinct NLI datasets, we
find that this approach, which we refer to as a hypothesis-only model, is able
to significantly outperform a majority class baseline across a number of NLI
datasets. Our analysis suggests that statistical irregularities may allow a
model to perform NLI in some datasets beyond what should be achievable without
access to the context.Comment: Accepted at *SEM 2018 as long paper. 12 page